Search CORE

37 research outputs found

Real-Time Streaming Multi-Pattern Search for Constant Alphabet

Author: Golan Shay
Porat Ely
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 25th Annual European Symposium on Algorithms (ESA 2017)
Publication date: 01/01/2017
Field of study

In the streaming multi-pattern search problem, which is also known as the streaming dictionary matching problem, a set D={P_1,P_2, . . . ,P_d} of d patterns (strings over an alphabet Sigma), called the dictionary, is given to be preprocessed. Then, a text T arrives one character at a time and the goal is to report, before the next character arrives, the longest pattern in the dictionary that is a current suffix of T. We prove that for a constant size alphabet, there exists a randomized Monte-Carlo algorithm for the streaming dictionary matching problem that takes constant time per character and uses O(d log m) words of space, where m is the length of the longest pattern in the dictionary. In the case where the alphabet size is not constant, we introduce two new randomized Monte-Carlo algorithms with the following complexities: * O(log log |Sigma|) time per character in the worst case and O(d log m) words of space. * O(1/epsilon) time per character in the worst case and O(d |Sigma|^epsilon log m/epsilon) words of space for any 0<epsilon<= 1. These results improve upon the algorithm of [Clifford et al., ESA\u2715] which uses O(d log m) words of space and takes O(log log (m+d)) time per character

Dagstuhl Research Online Publication Server

Streaming Pattern Matching with d Wildcards

Author: Golan Shay
Kopelowitz Tsvi
Porat Ely
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th Annual European Symposium on Algorithms (ESA 2016)
Publication date: 01/01/2016
Field of study

In the pattern matching with d wildcards problem we are given a text T of length n and a pattern P of length m that contains d wildcard characters, each denoted by a special symbol \u27?\u27. A wildcard character matches any other character. The goal is to establish for each m-length substring of T whether it matches P. In the streaming model variant of the pattern matching with d wildcards problem the text T arrives one character at a time and the goal is to report, before the next character arrives, if the last m characters match P while using only o(m) words of space. In this paper we introduce two new algorithms for the d wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant 0<=delta<=1. This algorithm uses ~O(d^{1-delta}) amortized time per character and ~O(d^{1+delta}) words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses O(d+log m) worst-case time per character and O(d log m) words of space

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Locally Consistent Parsing for Text Indexing in Small Space

Author: Birenzwige Or
Golan Shay
Porat Ely
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

We consider two closely related problems of text indexing in a sub-linear working space. The first problem is the Sparse Suffix Tree (SST) construction of a set of suffixes

B

using only

O(|B|)

words of space. The second problem is the Longest Common Extension (LCE) problem, where for some parameter

1\le\tau\le n

, the goal is to construct a data structure that uses

O(\frac {n}{\tau})

words of space and can compute the longest common prefix length of any pair of suffixes. We show how to use ideas based on the Locally Consistent Parsing technique, that was introduced by Sahinalp and Vishkin [STOC '94], in some non-trivial ways in order to improve the known results for the above problems. We introduce new Las-Vegas and deterministic algorithms for both problems. We introduce the first Las-Vegas SST construction algorithm that takes

O(n)

time. This is an improvement over the last result of Gawrychowski and Kociumaka [SODA '17] who obtained

O(n)

time for Monte-Carlo algorithm, and

O(n\sqrt{\log |B|})

time for Las-Vegas algorithm. In addition, we introduce a randomized Las-Vegas construction for an LCE data structure that can be constructed in linear time and answers queries in

O(\tau)

time. For the deterministic algorithms, we introduce an SST construction algorithm that takes

O(n\log \frac{n}{|B|})

time (for

|B|=\Omega(\log n)

). This is the first almost linear time,

O(n\cdot poly\log{n})

, deterministic SST construction algorithm, where all previous algorithms take at least

\Omega\left(\min\{n|B|,\frac{n^2}{|B|}\}\right)

time. For the LCE problem, we introduce a data structure that answers LCE queries in

O(\tau\sqrt{\log^*n})

time, with

O(n\log\tau)

construction time (for

\tau=O(\frac{n}{\log n})

). This data structure improves both query time and construction time upon the results of Tanimura et al. [CPM '16].Comment: Extended abstract to appear is SODA 202

arXiv.org e-Print Archive

Crossref

Towards Optimal Approximate Streaming Pattern Matching by Matching Multiple Patterns in Multiple Streams

Author: Golan Shay
Kopelowitz Tsvi
Porat Ely
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Publication date: 01/01/2018
Field of study

Recently, there has been a growing focus in solving approximate pattern matching problems in the streaming model. Of particular interest are the pattern matching with k-mismatches (KMM) problem and the pattern matching with w-wildcards (PMWC) problem. Motivated by reductions from these problems in the streaming model to the dictionary matching problem, this paper focuses on designing algorithms for the dictionary matching problem in the multi-stream model where there are several independent streams of data (as opposed to just one in the streaming model), and the memory complexity of an algorithm is expressed using two quantities: (1) a read-only shared memory storage area which is shared among all the streams, and (2) local stream memory that each stream stores separately. In the dictionary matching problem in the multi-stream model the goal is to preprocess a dictionary D={P_1,P_2,...,P_d} of d=|D| patterns (strings with maximum length m over alphabet Sigma) into a data structure stored in shared memory, so that given multiple independent streaming texts (where characters arrive one at a time) the algorithm reports occurrences of patterns from D in each one of the texts as soon as they appear. We design two efficient algorithms for the dictionary matching problem in the multi-stream model. The first algorithm works when all the patterns in D have the same length m and costs O(d log m) words in shared memory, O(log m log d) words in stream memory, and O(log m) time per character. The second algorithm works for general D, but the time cost per character becomes O(log m+log d log log d). We also demonstrate the usefulness of our first algorithm in solving both the KMM problem and PMWC problem in the streaming model. In particular, we obtain the first almost optimal (up to poly-log factors) algorithm for the PMWC problem in the streaming model. We also design a new algorithm for the KMM problem in the streaming model that, up to poly-log factors, has the same bounds as the most recent results that use different techniques. Moreover, for most inputs, our algorithm for KMM is significantly faster on average

Dagstuhl Research Online Publication Server

The Streaming k-Mismatch Problem: Tradeoffs Between Space and Total Time

Author: Golan Shay
Kociumaka Tomasz
Kopelowitz Tsvi
Porat Ely
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
Publication date: 01/01/2020
Field of study

We revisit the

k

-mismatch problem in the streaming model on a pattern of length

m

and a streaming text of length

n

, both over a size-

\sigma

alphabet. The current state-of-the-art algorithm for the streaming

k

-mismatch problem, by Clifford et al. [SODA 2019], uses

\tilde O(k)

space and

\tilde O\big(\sqrt k\big)

worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is

\tilde O(n\sqrt k)

, and the fastest known offline algorithm, which costs

\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)

time. Moreover, it is not known whether improvements over the

\tilde O(n\sqrt k)

total time are possible when using more than

O(k)

space. We address these gaps by designing a randomized streaming algorithm for the

k

-mismatch problem that, given an integer parameter

k\le s \le m

, uses

\tilde O(s)

space and costs

\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{\sigma nm}s\big)\big)

total time. For

s=m

, the total runtime becomes

\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},\sigma n\big)\big)

, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still

\tilde O\big(\sqrt k\big)

.Comment: Extended abstract to appear in CPM 202

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Improved Circular k-Mismatch Sketches

Author: Golan Shay
Kociumaka Tomasz
Kopelowitz Tsvi
Porat Ely
Uzna?ski Przemys?aw
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2020)
Publication date: 01/01/2020
Field of study

The shift distance

\mathsf{sh}(S_1,S_2)

between two strings

S_1

and

S_2

of the same length is defined as the minimum Hamming distance between

S_1

and any rotation (cyclic shift) of

S_2

. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings

S_1

and

S_2

of length

n

are given to two identical players (encoders), who independently compute sketches (summaries)

\mathtt{sk}(S_1)

and

\mathtt{sk}(S_2)

, respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate)

\mathsf{sh}(S_1,S_2)

with high probability. This paper primarily focuses on the more general

k

-mismatch version of the problem, where the decoder is allowed to declare a failure if

\mathsf{sh}(S_1,S_2)>k

, where

k

is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular

k

-mismatch sketches of size

\widetilde{O}(k+D(n))

, where

D(n)

is the number of divisors of

n

. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular

k

-mismatch sketch of size

\widetilde{O}(k)

; this size matches communication-complexity lower bounds. We also design

(1\pm \varepsilon)

-approximate circular

k

-mismatch sketch of size

\widetilde{O}(\min(\varepsilon^{-2}\sqrt{k}, \varepsilon^{-1.5}\sqrt{n}))

, which improves upon an

\widetilde{O}(\varepsilon^{-2}\sqrt{n})

-size sketch of Crouch and McGregor (APPROX'11)

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Interpreting the results of chemical stone analysis in the era of modern stone analysis techniques

Author: Gilad Ron
Golan Shay
Holland Ronen
Lifshitz David
Ruth Tor
Usman Kalba D.
Williams James C., Jr.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/02/2017
Field of study

INTRODUCTION AND OBJECTIVE: Stone analysis should be performed in all first-time stone formers. The preferred analytical procedures are Fourier-transform infrared spectroscopy (FT-IR) or X-ray diffraction (XRD). However, due to limited resources, chemical analysis (CA) is still in use throughout the world. The aim of the study was to compare FT-IR and CA in well matched stone specimens and characterize the pros and cons of CA. METHODS: In a prospective bi-center study, urinary stones were retrieved from 60 consecutive endoscopic procedures. In order to assure that identical stone samples were sent for analyses, the samples were analyzed initially by micro-computed tomography to assess uniformity of each specimen before submitted for FTIR and CA. RESULTS: Overall, the results of CA did not match with the FTIR results in 56 % of the cases. In 16 % of the cases CA missed the major stone component and in 40 % the minor stone component. 37 of the 60 specimens contained CaOx as major component by FTIR, and CA reported major CaOx in 47/60, resulting in high sensitivity, but very poor specificity. CA was relatively accurate for UA and cystine. CA missed struvite and calcium phosphate as a major component in all cases. In mixed stones the sensitivity of CA for the minor component was poor, generally less than 50 %. CONCLUSIONS: Urinary stone analysis using CA provides only limited data that should be interpreted carefully. Urinary stone analysis using CA is likely to result in clinically significant errors in its assessment of stone composition. Although the monetary costs of CA are relatively modest, this method does not provide the level of analytical specificity required for proper management of patients with metabolic stones

IUPUIScholarWorks

A clinical evaluation of an ex vivo organ culture system to predict patient response to cancer therapy

Author: Adi Zundelevich
Amir Sonnenblick
Amir Sonnenblick
Aviad Zick
Ayala Hubert
Boris Chertin
Chani Stossel
Dan Leibovici
Daniel Kedar
Dmitry Koulikov
Eli Rosenbaum
Erez Stossel
German Creiderman
Giuseppe Mallel
Guy Neev
Hagit Shapira
Hamutal Shahar
Hovav Nechushtan
Liat Applebaum
Lubov Turovsky
Nancy Gavert
Raanan Berger
Ravid Straussman
Sara Aharon
Seth J. Salpeter
Shani Breuer
Sharon Halperin
Shay Golan
Talia Golan
Tamar Peretz
Vered Bar
Yakir Rottenberg
Zohar Dotan
Publication venue: Frontiers Media S.A.
Publication date: 01/09/2023
Field of study

IntroductionEx vivo organ cultures (EVOC) were recently optimized to sustain cancer tissue for 5 days with its complete microenvironment. We examined the ability of an EVOC platform to predict patient response to cancer therapy.MethodsA multicenter, prospective, single-arm observational trial. Samples were obtained from patients with newly diagnosed bladder cancer who underwent transurethral resection of bladder tumor and from core needle biopsies of patients with metastatic cancer. The tumors were cut into 250 μM slices and cultured within 24 h, then incubated for 96 h with vehicle or intended to treat drug. The cultures were then fixed and stained to analyze their morphology and cell viability. Each EVOC was given a score based on cell viability, level of damage, and Ki67 proliferation, and the scores were correlated with the patients’ clinical response assessed by pathology or Response Evaluation Criteria in Solid Tumors (RECIST).ResultsThe cancer tissue and microenvironment, including endothelial and immune cells, were preserved at high viability with continued cell division for 5 days, demonstrating active cell signaling dynamics. A total of 34 cancer samples were tested by the platform and were correlated with clinical results. A higher EVOC score was correlated with better clinical response. The EVOC system showed a predictive specificity of 77.7% (7/9, 95% CI 0.4–0.97) and a sensitivity of 96% (24/25, 95% CI 0.80–0.99).ConclusionEVOC cultured for 5 days showed high sensitivity and specificity for predicting clinical response to therapy among patients with muscle-invasive bladder cancer and other solid tumors

Directory of Open Access Journals

Emerging roles of hnRNPA1 inmodulating malignanttransformation

Author: Allemand
Arnaud
Azzalin
Azzalin
Babic
Bates
Beijersbergen
Beretta
Bevilacqua
Beyer
Bikfalvi
Birney
Bishop
Blackburn
Bonnal
Bonomi
Brockstedt
Brooks
Burd
Buvoli
Cammas
Caputi
Chatterjee
Chellappan
Chen
Chen
Chen
Christofk
Clower
Cogoi
David
Dean
Del Gatto
Denchi
Deng
Dick
Ding
Dreyfuss
Dreyfuss
Dreyfuss
Durie
Durkin
Dvinge
Faye
Feuerhahn
Fiset
Flynn
Ford
Frenzel
Galy
Galy
Ghigna
Glover
Golan-Gerstl
Guil
Guo
Hamilton
Holcik
Holcik
Idriss
Iervolino
Jean-Philippe
Jean-Philippe
Jo
Jordan
Karn
Karni
Keely
Kenan
Kim
Ko
LaBranche
LaCasse
Le
Lewis
Li
Liu
Liwak
Loh
Louderbough
Martin
Matter
Mayeda
Municio
Nadler
Nguyen
O'Sullivan
Pandya
Pardo
Patry
Pavlova
Pelisch
Petersen
Pino
Pinol-Roma
Pinol-Roma
Radisky
Rajpurohit
Redon
Redon
Ringel
Roy
Schlosser
Schnelzer
Schoeftner
Screaton
Shay
Shen
Shi
Shi
Shiio
Siomi
Sui
Tahara
Thiery
Ting
Vagner
Varfolomeev
Wang
Warnakulasuriyarachchi
Wold
Wong
Xu
Yeung
Yoon
Yu
Yunis
Zhang
Zhao
Zhou
Zhu
Zou
Zou
Publication venue: 'Wiley'
Publication date: 22/05/2017
Field of study

Heterogeneous nuclear ribonucleoproteins (hnRNPs) are RNA-binding proteins associated with complex and diverse biological processes such as processing of heterogeneous nuclear RNAs (hnRNAs) into mature mRNAs, RNA splicing, transactivation of gene expression, and modulation of protein translation. hnRNPA1 is the most abundant and ubiquitously expressed member of this protein family and has been shown to be involved in multiple molecular events driving malignant transformation. In addition to selective mRNA splicing events promoting expression of specific protein variants, hnRNPA1 regulates the gene expression and translation of several key players associated with tumorigenesis and cancer progression. Here, we will summarize our current knowledge of the involvement of hnRNPA1 in cancer, including its roles in regulating cell proliferation, invasiveness, metabolism, adaptation to stress and immortalization

Crossref

Spiral - Imperial College Digital Repository